The performance of the Deep Learning (DL) models depends on the quality of labels. In some areas, the involvement of human annotators may lead to noise in the data. When these corrupted labels are blindly regarded as the ground truth (GT), DL models suffer from performance deficiency. This paper presents a method that aims to learn a confident model in the presence of noisy labels. This is done in conjunction with estimating the uncertainty of multiple annotators. We robustly estimate the predictions given only the noisy labels by adding entropy or information-based regularizer to the classifier network. We conduct our experiments on a noisy version of MNIST, CIFAR-10, and FMNIST datasets. Our empirical results demonstrate the robustness of our method as it outperforms or performs comparably to other state-of-the-art (SOTA) methods. In addition, we evaluated the proposed method on the curated dataset, where the noise type and level of various annotators depend on the input image style. We show that our approach performs well and is adept at learning annotators' confusion. Moreover, we demonstrate how our model is more confident in predicting GT than other baselines. Finally, we assess our approach for segmentation problem and showcase its effectiveness with experiments.
translated by 谷歌翻译
当肿瘤学家估计癌症患者的生存时,他们依靠多模式数据。尽管文献中已经提出了一些多模式的深度学习方法,但大多数人都依靠拥有两个或多个独立的网络,这些网络在整个模型的稍后阶段共享知识。另一方面,肿瘤学家在分析中没有这样做,而是通过多种来源(例如医学图像和患者病史)融合大脑中的信息。这项工作提出了一种深度学习方法,可以在量化癌症和估计患者生存时模仿肿瘤学家的分析行为。我们提出了TMSS,这是一种基于端到端变压器的多模式网络,用于分割和生存预测,该网络利用了变压器的优越性,这在于其能力处理不同模态的能力。该模型经过训练并验证了从头部和颈部肿瘤分割的训练数据集上的分割和预后任务以及PET/CT图像挑战(Hecktor)中的结果预测。我们表明,所提出的预后模型显着优于最先进的方法,其一致性指数为0.763 +/- 0.14,而与独立段模型相当的骰子得分为0.772 +/- 0.030。该代码公开可用。
translated by 谷歌翻译
学习时空特征是有效的视频理解的重要任务,尤其是在超声心动图等医学图像中。卷积神经网络(CNN)和最新的视觉变压器(VIT)是最常用的方法,每个方法都有局限性。 CNN擅长捕获本地环境,但无法在视频帧中学习全局信息。另一方面,视觉变压器可以结合全球细节和长序列,但在计算上却很昂贵,通常需要更多的数据进行训练。在本文中,我们提出了一种方法,该方法可以解决我们通常在医学视频数据(例如超声心动图扫描)培训时面临的局限性。我们提出的算法(echocotr)利用视觉变压器和CNN的强度来解决超声视频上估算左心室射血分数(LVEF)的问题。我们演示了所提出的方法在Echonet-Dynamic数据集上的表现如何以3.95和$ r^2 $为0.82。与所有已发表的研究相比,这些结果显示出明显的改善。此外,我们与包括VIT和BERT在内的多种算法显示了广泛的消融和比较。该代码可在https://github.com/biomedia-mbzuai/echocotr上找到。
translated by 谷歌翻译
视觉变压器(VIT)竞争替代卷积神经网络(CNN),以完成医学成像中的各种计算机视觉任务,例如分类和分割。尽管CNN对对抗攻击的脆弱性是一个众所周知的问题,但最近的作品表明,VIT也容易受到此类攻击的影响,并且在攻击下遭受了重大的绩效退化。 VIT易于精心设计的对抗样品的脆弱性引起了人们对它们在临床环境中的安全性的严重关注。在本文中,我们提出了一种新型的自我浓缩方法,以在存在对抗性攻击的情况下增强VIT的鲁棒性。拟议的自我启发变压器(SEVIT)利用了一个事实,即通过VIT的初始块学到的特征表示相对不受对抗性扰动的影响。根据这些中间特征表示,学习多个分类器,并将这些预测与最终VIT分类器的预测相结合可以为对抗性攻击提供鲁棒性。测量各种预测之间的一致性也可以帮助检测对抗样本。对两种方式(胸部X射线和基础镜检查)进行的实验证明了SEVIT体系结构在灰色框中防御各种对抗性攻击的功效(攻击者对目标模型有充分的了解,但没有防御机制)设置。代码:https://github.com/faresmalik/sevit
translated by 谷歌翻译
快速准确的诊断对于减轻Covid-19感染的影响至关重要,尤其是对于严重病例。已经为开发深度学习方法而付出了巨大的努力,以从胸部X射线照相图像分类和检测COVID-19的感染。但是,最近,围绕此类方法的临床生存能力和有效性提出了一些问题。在这项工作中,我们研究了多任务学习(分类和分割)对CNN区分肺中Covid-19感染各种外观的能力的影响。我们还采用了自我监督的预训练方法,即Moco和Inpainting-CXR,以消除对COVID-19分类的昂贵地面真相注释的依赖。最后,我们对模型进行了批判性评估,以评估其部署准备,并提供有关胸部X射线中细粒度COVID-19多级分类的困难的见解。
translated by 谷歌翻译
对比度学习已在许多应用程序中有限的许多应用中有用。缺乏注释数据在医学图像分割中尤其有问题,因为很难让临床专家手动注释大量数据,例如心脏超声图像中的心脏结构。在本文中,我们认为对比训练是否有助于超声心动图图像中左心室的分割。此外,我们研究了对比预处理对两个众所周知的分割网络UNET和DEEPLABV3的影响。我们的结果表明,对比预处理有助于改善左心室分割的性能,尤其是当带注释的数据稀缺时。我们展示了如何以自我监督的方式训练模型时,与最先进的完全监督算法获得可比的结果,然后仅对5%的数据进行微调。我们表明,我们的解决方案优于当前在大型公共数据集(Echonet-Dynemic)上达到的骰子分数为0.9211的内容。我们还将解决方案在另一个较小的数据集(CAMUS)上的性能进行比较,以证明我们提出的解决方案的普遍性。该代码可在(https://github.com/biomedia-mbzuai/contrastive-echo)上获得。
translated by 谷歌翻译
Deep neural networks (DNNs) are vulnerable to a class of attacks called "backdoor attacks", which create an association between a backdoor trigger and a target label the attacker is interested in exploiting. A backdoored DNN performs well on clean test images, yet persistently predicts an attacker-defined label for any sample in the presence of the backdoor trigger. Although backdoor attacks have been extensively studied in the image domain, there are very few works that explore such attacks in the video domain, and they tend to conclude that image backdoor attacks are less effective in the video domain. In this work, we revisit the traditional backdoor threat model and incorporate additional video-related aspects to that model. We show that poisoned-label image backdoor attacks could be extended temporally in two ways, statically and dynamically, leading to highly effective attacks in the video domain. In addition, we explore natural video backdoors to highlight the seriousness of this vulnerability in the video domain. And, for the first time, we study multi-modal (audiovisual) backdoor attacks against video action recognition models, where we show that attacking a single modality is enough for achieving a high attack success rate.
translated by 谷歌翻译
Unmanned aerial vehicle (UAV) swarms are considered as a promising technique for next-generation communication networks due to their flexibility, mobility, low cost, and the ability to collaboratively and autonomously provide services. Distributed learning (DL) enables UAV swarms to intelligently provide communication services, multi-directional remote surveillance, and target tracking. In this survey, we first introduce several popular DL algorithms such as federated learning (FL), multi-agent Reinforcement Learning (MARL), distributed inference, and split learning, and present a comprehensive overview of their applications for UAV swarms, such as trajectory design, power control, wireless resource allocation, user assignment, perception, and satellite communications. Then, we present several state-of-the-art applications of UAV swarms in wireless communication systems, such us reconfigurable intelligent surface (RIS), virtual reality (VR), semantic communications, and discuss the problems and challenges that DL-enabled UAV swarms can solve in these applications. Finally, we describe open problems of using DL in UAV swarms and future research directions of DL enabled UAV swarms. In summary, this survey provides a comprehensive survey of various DL applications for UAV swarms in extensive scenarios.
translated by 谷歌翻译
Compared to regular cameras, Dynamic Vision Sensors or Event Cameras can output compact visual data based on a change in the intensity in each pixel location asynchronously. In this paper, we study the application of current image-based SLAM techniques to these novel sensors. To this end, the information in adaptively selected event windows is processed to form motion-compensated images. These images are then used to reconstruct the scene and estimate the 6-DOF pose of the camera. We also propose an inertial version of the event-only pipeline to assess its capabilities. We compare the results of different configurations of the proposed algorithm against the ground truth for sequences of two publicly available event datasets. We also compare the results of the proposed event-inertial pipeline with the state-of-the-art and show it can produce comparable or more accurate results provided the map estimate is reliable.
translated by 谷歌翻译
With Twitter's growth and popularity, a huge number of views are shared by users on various topics, making this platform a valuable information source on various political, social, and economic issues. This paper investigates English tweets on the Russia-Ukraine war to analyze trends reflecting users' opinions and sentiments regarding the conflict. The tweets' positive and negative sentiments are analyzed using a BERT-based model, and the time series associated with the frequency of positive and negative tweets for various countries is calculated. Then, we propose a method based on the neighborhood average for modeling and clustering the time series of countries. The clustering results provide valuable insight into public opinion regarding this conflict. Among other things, we can mention the similar thoughts of users from the United States, Canada, the United Kingdom, and most Western European countries versus the shared views of Eastern European, Scandinavian, Asian, and South American nations toward the conflict.
translated by 谷歌翻译